in a Notebook
You can work with Stata
in a Python
notebook by using the package ipystata
. Just like r2py
, which allows us to use R
in Python
, we can now use both (or if you want all three!) programming languages in one notebook.
Setup¶
Let's start by importing all the packages we want to use.
import numpy as np
import pandas as pd
import ipystata
%pylab --no-import-all
%matplotlib inline
%%stata magic¶
In order to use ipystata
you will need to use the %%stata
magic. Let's see the help for it.
%%stata?
First example¶
Let's run some commands in Stata
from this notebook. Let's run the same code as in the Stata Notebook Example. To do so, we will use the %%stata
magic.
%%stata
sysuse auto.dta
summ
desc
reg price mpg rep78 headroom trunk weight length turn displacement gear_ratio foreign, r
scatter price mpg, mlabel(make)
Notice that it returned everything except the graph. To be able to get the graph we need to provide the option -s graph_session
to the %%stata
magic.
%%stata -gr
sysuse auto.dta
summ
desc
reg price mpg rep78 headroom trunk weight length turn displacement gear_ratio foreign, r
scatter price mpg, mlabel(make)
Looks like there are issues preventing Stata
to pass the figure back to Jupyter
. Nonetheless, we can save it in Stata
and open it here.
%%stata -gr
sysuse auto.dta
summ
desc
reg price mpg rep78 headroom trunk weight length turn displacement gear_ratio foreign, r
scatter price mpg, mlabel(make)
graph export "./graphs/price-mpg.png", replace
Let's import the figure to our notebook.
Moving data between Stata and Python¶
As we have seen Python
is very powerful for data munging and cleaning. Also, we have seen that figures may look much nicer. But, since we already know Stata
for econometric analyses, let's use both languages to get the best of each. We can do this by passing additional options to %%stata
. First, let's get the data from auto.dta
from Stata
as a pandas
dataframe.
%%stata -o car_df
sysuse auto.dta
car_df
Some analyses in Python¶
Now that we have the data in python
we can do some analyses, merge with other datasets, or create some plots.
# Import matplotlib
import matplotlib as mpl
# Import seaborn
import seaborn as sns
sns.set()
# paths
pathgraphs = './graphs/'
# Define our function to plot
def ScatterPlot(dfin, var0='mpg', var1='price', labelvar='make',
dx=0.006125, dy=0.006125,
xlabel='Miles per Gallon',
ylabel='Price',
linelabel='Price',
filename='price-mpg.pdf'):
'''
Plot the association between var0 and var in dataframe using labelvar for labels.
'''
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
df = dfin.copy()
df = df.dropna(subset=[var0, var1]).reset_index(drop=True)
# Plot
k = 0
fig, ax = plt.subplots()
sns.regplot(x=var0, y=var1, data=df, ax=ax, label=linelabel)
movex = df[var0].mean() * dx
movey = df[var1].mean() * dy
for line in range(0,df.shape[0]):
ax.text(df[var0][line]+movex, df[var1][line]+movey, df[labelvar][line], horizontalalignment='left', fontsize=14, color='black')
ax.set_xlabel(xlabel)
ax.set_ylabel(ylabel)
plt.xlim([df[var0].min()-1, df[var0].max()+1])
plt.ylim([0, df[var1].max()+1000])
ax.tick_params(axis = 'both', which = 'major', labelsize=16)
ax.tick_params(axis = 'both', which = 'minor', labelsize=8)
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend()
plt.savefig(pathgraphs + filename, dpi=300, bbox_inches='tight')
pass
ScatterPlot(car_df)
Creating some data¶
car_df['mpg_sq'] = car_df.mpg ** 2
Analyzing the new data in Stata¶
%%stata -d car_df
reg price mpg mpg_sq rep78 headroom trunk weight length turn displacement gear_ratio foreign, r
Additional Information¶
If you want to perform additional tasks between both programs, you can check this example notebook by the author of ipystata
or the ipystata website.